EXTRACTING CONTENT IN REGIONAL WEB DOCUMENTS WITH TEXT VARIATIONS

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Content Reliability Estimation in Web Documents: A New Proposal

This paper illustrates how a combination of information retrieval, machine learning, and NLP corpus annotation techniques was applied to a problem of text content reliability estimation in Web documents. Our proposal for text content reliability estimation is based on a model in which reliability is a similarity measure between the content of the documents and a knowledge corpus. The proposal i...

متن کامل

Extracting Financial Information from Text Documents

The majority of electronic data today is in textual form. Financial data such as articles in the Wall Street Journal are written as texts. These electronic documents contain a wealth of information but require human interpretation. For financial analysis, rapid up-to-date information is critical. Most software tools currently require data which are better structured than text (such as data in r...

متن کامل

Extracting Interlinear Glossed Text from LaTeX Documents

We present texigt, a command-line tool for the extraction of structured linguistic data from LTEX source documents, and a language resource that has been generated using this tool: a corpus of interlinear glossed text (IGT) extracted from open access books published by Language Science Press. Extracted examples are represented in a simple XML format that is easy to process and can be used to va...

متن کامل

Detecting and Extracting Events from Text Documents

Events of various kinds are mentioned and discussed in text documents, whether they are books, news articles, blogs or microblog feeds. The paper starts by giving an overview of how events are treated in linguistics and philosophy. We follow this discussion by surveying how events and associated information are handled in computationally. In particular, we look at how textual documents can be m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Information Sciences and Computing

سال: 2014

ISSN: 0973-9092

DOI: 10.18000/ijisac.50144